Skip to content

feat(block_service): add JuiceFS support as block storage backend#2384

Open
ruojieranyishen wants to merge 5 commits intoapache:masterfrom
ruojieranyishen:supportjuicefsblockservice
Open

feat(block_service): add JuiceFS support as block storage backend#2384
ruojieranyishen wants to merge 5 commits intoapache:masterfrom
ruojieranyishen:supportjuicefsblockservice

Conversation

@ruojieranyishen
Copy link
Collaborator

@ruojieranyishen ruojieranyishen commented Mar 11, 2026

#2383

This commit adds support for JuiceFS as a block service backend in Pegasus. JuiceFS provider is recognized by the jfs:// prefix pattern and reuses the existing hdfs_service implementation for file operations.

Todo

  1. Support Pegasus Spark to read RocksDB checkpoint files from JuiceFS.

…ache#2383)

apache#2383

This commit adds support for JuiceFS as a block service backend in Pegasus.
JuiceFS provider is recognized by the jfs:// prefix pattern and reuses the
existing hdfs_service implementation for file operations.
Copy link

@acelyc111-bot acelyc111-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: JuiceFS block storage support

Summary: Adds JuiceFS as a new block storage backend by reusing the HDFS service implementation. Detects JuiceFS providers via jfs:// prefix pattern.

What's good:

  • Clever approach — leverages existing HDFS service since JuiceFS provides HDFS-compatible interface
  • Provider detection logic (is_juicefs_provider) validates format correctly (prefix, @, non-empty host)
  • Clean separation of concerns for args handling between JuiceFS and standard providers

Issues / Suggestions:

  1. Hardcoded jar version in pack_server.shjuicefs-hadoop-1.3.1.jar is downloaded with wget -N at build time. This will fail in air-gapped build environments. Consider making the version configurable or bundling it as a dependency.

  2. jfs:// prefix is a magic string — Extract to a constant (e.g., JUICEFS_PROVIDER_PREFIX) alongside BLOCK_SERVICE_JUICEFS.

  3. No unit tests for is_juicefs_provider — The validation logic is simple but worth testing edge cases like jfs:// (no user), jfs://user@ (empty host), jfs://@host (empty user), dfs://user@host (wrong prefix).

  4. Parser throws away parsed components — The function validates user@host format but then reconstructs args from the raw provider string. The parsed user could be validated further (e.g., non-empty after prefix).

  5. Test script uses example placeholder — The package_dir="example: ..." in run.sh looks like documentation rather than a working config. Should this be a parameter or env var default?

Verdict: ⚠️ Request Changes — The hardcoded jar download and lack of unit tests for the detection logic should be addressed before merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants